Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data
نویسندگان
چکیده
Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression such scenarios, including variable screening, model selection, order selection response categories, and selection. perform our gene expression with 801 patients, 2426 genes, five types of cancerous tumors. As result, we recommend three finalized models: one 74 genes achieves extremely low cross-entropy loss zero predictive error rate five-fold cross-validation; two other models 31 4 respectively, recommended prognostic multi-gene signatures.
منابع مشابه
Methods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملSubspace Clustering for High Dimensional Categorical Data
A fundamental operation in data mining is to partition a given dataset into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria [2]. These criteria are usually defined in the form of some distance, and similarity is hence defined as follows, the smaller the distance is, the more similar the objects a...
متن کاملSparse Quadratic Discriminant Analysis For High Dimensional Data
Many contemporary studies involve the classification of a subject into two classes based on n observations of the p variables associated with the subject. Under the assumption that the variables are normally distributed, the well-known linear discriminant analysis (LDA) assumes a common covariance matrix over the two classes while the quadratic discriminant analysis (QDA) allows different covar...
متن کاملSparse Modeling for High - Dimensional Multi - Manifold Data Analysis
High-dimensional data are ubiquitous in many areas of science and engineering, such as machine learning, signal and image processing, computer vision, pattern recognition, bioinformatics, etc. Often, high-dimensional data are not distributed uniformly in the ambient space; instead they lie in or close to a union of low-dimensional manifolds. Recovering such low-dimensional structures in the dat...
متن کاملPrincipal Component Analysis for Sparse High-Dimensional Data
Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Biotech
سال: 2023
ISSN: ['2673-6284']
DOI: https://doi.org/10.3390/biotech12030052